skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Friday, February 6 until 10:00 AM ET on Saturday, February 7 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Oh, S"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The assumption across nearly all language model (LM) tokenization schemes is that tokens should be subwords, i.e., contained within word boundaries. While providing a seemingly reasonable inductive bias, is this common practice limiting the potential of modern LMs? Whitespace is not a reliable delimiter of meaning, as evidenced by multi-word expressions (e.g., "by the way"), crosslingual variation in the number of words needed to express a concept (e.g., "spacesuit helmet" in German is "raumanzughelm"), and languages that do not use whitespace at all (e.g., Chinese). To explore the potential of tokenization beyond subwords, we introduce a "superword" tokenizer, SuperBPE, which incorporates a simple pretokenization curriculum into the byte-pair encoding (BPE) algorithm to first learn subwords, then superwords that bridge whitespace. This brings dramatic improvements in encoding efficiency: when fixing the vocabulary size to 200k, SuperBPE encodes a fixed piece of text with up to 33% fewer tokens than BPE on average. In experiments, we pretrain 8B transformer LMs from scratch while fixing the model size, vocabulary size, and train compute, varying *only* the algorithm for learning the vocabulary. Our model trained with SuperBPE achieves an average +4.0% absolute improvement over the BPE baseline across 30 downstream tasks (including +8.2% on MMLU), while simultaneously requiring 27% less compute at inference time. In analysis, we find that SuperBPE results in segmentations of text that are more uniform in per-token difficulty. Qualitatively, this may be because SuperBPE tokens often capture common multi-word expressions that function semantically as a single unit. SuperBPE is a straightforward, local modification to tokenization that improves both encoding efficiency and downstream performance, yielding better language models overall. 
    more » « less
  2. Diffusion models (DMs) create samples from a data distribution by starting from random noise and iteratively solving a reverse-time ordinary differential equation (ODE). Because each step in the iterative solution requires an expensive neural function evaluation (NFE), there has been significant interest in approximately solving these diffusion ODEs with only a few NFEs without modifying the underlying model. However, in the few NFE regime, we observe that tracking the true ODE evolution is fundamentally impossible using traditional ODE solvers. In this work, we propose a new method that learns a good solver for the DM, which we call Solving for the Solver (S4S). S4S directly optimizes a solver to obtain good generation quality by learning to match the output of a strong teacher solver. We evaluate S4S on six different pre-trained DMs, including pixel-space and latent-space DMs for both conditional and unconditional sampling. In all settings, S4S uniformly improves the sample quality relative to traditional ODE solvers. Moreover, our method is lightweight, data-free, and can be plugged in black-box on top of any discretization schedule or architecture to improve performance. Building on top of this, we also propose S4S-Alt, which optimizes both the solver and the discretization schedule. By exploiting the full design space of DM solvers, with 5 NFEs, we achieve an FID of 3.73 on CIFAR10 and 13.26 on MS-COCO, representing a 1.5× improvement over previous training-free ODE methods. 
    more » « less
  3. ABSTRACT Theory and observations reveal that the circumgalactic medium (CGM) and the cosmic web at high redshifts are multiphase, with small clouds of cold gas embedded in a hot, diffuse medium. We study the ‘shattering’ of large, thermally unstable clouds into tiny cloudlets of size $$\ell _{\rm shatter}\sim {\rm min}(c_{\rm s}t_{\rm cool})$$ using idealized numerical simulations. We expand upon previous works by exploring the effects of cloud geometry (spheres, streams, and sheets), metallicity, and an ionizing ultraviolet background. We find that ‘shattering’ is mainly triggered by clouds losing sonic contact and rapidly imploding, leading to a reflected shock that causes the cloud to re-expand and induces Richtmyer–Meshkov instabilities at its interface. The fragmented cloudlets experience a drag force from the surrounding hot gas, leading to recoagulation into larger clouds. We distinguish between ‘fast’ and ‘slow’ coagulation regimes. Sheets are always in the ‘fast’ coagulation regime, while streams and spheres transition to ‘slow’ coagulation above a critical overdensity, which is smallest for spheres. Surprisingly, $$\ell _\mathrm{shatter}$$ does not appear to be a characteristic clump size even if it is well resolved. Rather, fragmentation continues until the grid scale with a mass distribution of $$N(\gt m)\propto m^{-1}$$. We apply our results to cold streams feeding massive ($$M_{\rm v}\lower.5ex\rm{\,\, \buildrel\gt \over \sim \,\,}10^{12}\, {\rm M}_\odot$$) galaxies at $$z\lower.5ex\rm{\,\, \buildrel\gt \over \sim \,\,}2$$ from the cosmic web, finding that streams likely shatter upon entering the hot CGM through the virial shock. This could explain the large clumping factors and covering fractions of cold gas around such galaxies, and may be related to galaxy quenching by preventing cold streams from reaching the central galaxy. 
    more » « less
  4. Abstract While it is well known that cosmic rays (CRs) can gain energy from turbulence via second-order Fermi acceleration, how this energy transfer affects the turbulent cascade remains largely unexplored. Here, we show that damping and steepening of the compressive turbulent power spectrum are expected once the damping time t damp ∼ ρ v 2 / E ̇ CR ∝ E CR − 1 becomes comparable to the turbulent cascade time. Magnetohydrodynamic simulations of stirred compressive turbulence in a gas-CR fluid with diffusive CR transport show clear imprints of CR-induced damping, saturating at E ̇ CR ∼ ϵ ˜ , where ϵ ˜ is the turbulent energy input rate. In that case, almost all of the energy in large-scale motions is absorbed by CRs and does not cascade down to grid scale. Through a Hodge–Helmholtz decomposition, we confirm that purely compressive forcing can generate significant solenoidal motions, and we find preferential CR damping of the compressive component in simulations with diffusion and streaming, rendering small-scale turbulence largely solenoidal, with implications for thermal instability and proposed resonant scattering of E ≳ 300 GeV CRs by fast modes. When CR transport is streaming dominated, CRs also damp large-scale motions, with kinetic energy reduced by up to 1 order of magnitude in realistic E CR ∼ E g scenarios, but turbulence (with a reduced amplitude) still cascades down to small scales with the same power spectrum. Such large-scale damping implies that turbulent velocities obtained from the observed velocity dispersion may significantly underestimate turbulent forcing rates, i.e., ϵ ˜ ≫ ρ v 3 / L . 
    more » « less
  5. Abstract We investigate how cosmic rays (CRs) affect thermal and hydrostatic stability of circumgalactic (CGM) gas, in simulations with both CR streaming and diffusion. Local thermal instability can be suppressed by CR-driven entropy mode propagation, in accordance with previous analytic work. However, there is only a narrow parameter regime where this operates, before CRs overheat the background gas. As mass dropout from thermal instability causes the background density and hence plasma β ≡ Pg/PB to fall, the CGM becomes globally unstable. At the cool disk to hot halo interface, a sharp drop in density boosts Alfven speeds and CR gradients, driving a transition from diffusive to streaming transport. CR forces and heating strengthen, while countervailing gravitational forces and radiative cooling weaken, resulting in a loss of both hydrostatic and thermal equilibrium. In lower β halos, CR heating drives a hot, single-phase diffuse wind with velocities v∝(theat/tff)−1, which exceeds the escape velocity when theat/tff ≲ 0.4. In higher β halos, where the Alfven Mach number is higher, CR forces drive multi-phase winds with cool, dense fountain flows and significant turbulence. These flows are CR dominated due to ‘trapping’ of CRs by weak transverse B-fields, and have the highest mass loading factors. Thus, local thermal instability can result in winds or fountain flows where either the heat or momentum input of CRs dominates. 
    more » « less
  6. Spurred by rich, multiwavelength observations and enabled by new simulations, ranging from cosmological to subparsec scales, the past decade has seen major theoretical progress in our understanding of the circumgalactic medium (CGM). We review key physical processes in the CGM. Our conclusions include the following: ▪ The properties of the CGM depend on a competition between gravity-driven infall and gas cooling. When cooling is slow relative to free fall, the gas is hot (roughly virial temperature), whereas the gas is cold ( T ∼ 104K) when cooling is rapid. ▪ Gas inflows and outflows play crucial roles, as does the cosmological environment. Large-scale structure collimates cold streams and provides angular momentum. Satellite galaxies contribute to the CGM through winds and gas stripping. ▪ In multiphase gas, the hot and cold phases continuously exchange mass, energy, and momentum. The interaction between turbulent mixing and radiative cooling is critical. A broad spectrum of cold gas structures, going down to subparsec scales, arises from fragmentation, coagulation, and condensation onto gas clouds. ▪ Magnetic fields, thermal conduction, and cosmic rays can substantially modify how the cold and hot phases interact, although microphysical uncertainties are presently large. Key open questions for future work include the mutual interplay between small-scale structure and large-scale dynamics, and how the CGM affects the evolution of galaxies. 
    more » « less
  7. ABSTRACT Astrophysical gases such as the interstellar-, circumgalactic-, or intracluster-medium are commonly multiphase, which poses the question of the structure of these systems. While there are many known processes leading to fragmentation of cold gas embedded in a (turbulent) hot medium, in this work, we focus on the reverse process: coagulation. This is often seen in wind-tunnel and shearing layer simulations, where cold gas fragments spontaneously coalesce. Using 2D and 3D hydrodynamical simulations, we find that sufficiently large (≫cstcool), perturbed cold gas clouds develop pulsations which ensure cold gas mass growth over an extended period of time (≫r/cs). This mass growth efficiently accelerates hot gas which in turn can entrain cold droplets, leading to coagulation. The attractive inverse square force between cold gas droplets has interesting parallels with gravity; the ‘monopole’ is surface area rather than mass. We develop a simple analytic model which reproduces our numerical findings. 
    more » « less